-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added unzipping functionality for mimic iv example #31
added unzipping functionality for mimic iv example #31
Conversation
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis update enhances the MIMIC-IV pipeline by introducing an optional parameter in Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- MIMIC-IV_Example/joint_script.sh (1 hunks)
Files skipped from review due to trivial changes (1)
- MIMIC-IV_Example/joint_script.sh
Can you make this flag controlled, so that in the joint script someone can have this not be run if desired? Purely as a bash script thing, I mean, just with an optional 5th arg that must be "do_unzip=true" or "do_unzip=false" or something? This will cost a lot of disk, so some folks may not want to do it, is all. |
@Oufattole can you merge the changes in main to this, make it go into dev instead of main, and make this flag controlled via the shell parameters? Then I'm happy to merge it. |
MIMIC-IV_Example/joint_script.sh
Outdated
@@ -42,6 +42,9 @@ shift 4 | |||
echo "Running pre-MEDS conversion." | |||
./MIMIC-IV_Example/pre_MEDS.py raw_cohort_dir="$MIMICIV_RAW_DIR" output_dir="$MIMICIV_PREMEDS_DIR" | |||
|
|||
echo "Unzipping the pre-MEDS gunzip csv files." | |||
for file in ${MIMICIV_PREMEDS_DIR}/*/*.csv.gz; do gzip -d --force "$file"; done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, you probably want to do this on the raw data, right? That way the pre-meds can benefit from faster processing too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
2723b3e
to
1ba3135
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- MIMIC-IV_Example/joint_script.sh (1 hunks)
Additional comments not posted (1)
MIMIC-IV_Example/joint_script.sh (1)
39-39
: LGTM! The parameterDO_UNZIP
is correctly introduced.The default value ensures backward compatibility.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- MIMIC-IV_Example/joint_script.sh (1 hunks)
Files skipped from review due to trivial changes (1)
- MIMIC-IV_Example/joint_script.sh
@Oufattole the test failure here is due to a polars versioning issue. I'm going to fix it directly then you can merge that change in and we'll push this. |
MIMIC-IV_Example/joint_script.sh
Outdated
@@ -36,9 +36,17 @@ MIMICIV_RAW_DIR="$1" | |||
MIMICIV_PREMEDS_DIR="$2" | |||
MIMICIV_MEDS_DIR="$3" | |||
N_PARALLEL_WORKERS="$4" | |||
DO_UNZIP="${5:-do_unzip=false}" # Default to 'do_unzip=false' if not provided, this uses more disk space but reduces memory usage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to change one more thing here -- this script is set up so that the first 4 args are mandatory, and all others are sent to the scripts as arguments via the "$@"
thing. Right now, your DO_UNZIP
will absorb all of those other kwargs. You should either:
- Make sure that
DO_UNZIP
only absorbs the fifth argument if the arg there starts with--do_unzip
or something.
- or -
- Make there be a mandatory 5th arg that is
DO_UNZIP
and then replace theshift 4
withshift 5
so that it isn't used again below in the"$@"
@Oufattole I fixed the test issue but added one comment on your param and also you should update the display help function for this param. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range, codebase verification and nitpick comments (2)
terminology.md (2)
14-20
: Consider replacing repetitive word.The words ‘observations’ and ‘observing’ are quite similar. Consider replacing ‘observing’ with a different word to improve readability.
- such as observing a diagnostic code being applied to the patient, observing a patient's admission or transfer from one unit to another, observing a laboratory test result + such as noting a diagnostic code being applied to the patient, recording a patient's admission or transfer from one unit to another, observing a laboratory test resultTools
LanguageTool
[style] ~17-~17: The words ‘observations’ and ‘observing’ are quite similar. Consider replacing ‘observing’ with a different word.
Context: ...vations can take on many forms, such as observing a diagnostic code being applied to the ...(VERB_NOUN_SENT_LEVEL_REP)
33-36
: Simplify the phrase "at any point in time".The phrase "at any point in time" is redundant. Consider simplifying it to "at any time".
- that can be interpreted as being applicable to the patient at any point in time during their care. + that can be interpreted as being applicable to the patient at any time during their care.Tools
LanguageTool
[style] ~36-~36: This phrase is redundant. Consider writing “point” or “time”.
Context: ... being applicable to the patient at any point in time during their care. #### A _time-derive...(MOMENT_IN_TIME)
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- terminology.md (1 hunks)
Additional context used
LanguageTool
terminology.md
[style] ~17-~17: The words ‘observations’ and ‘observing’ are quite similar. Consider replacing ‘observing’ with a different word.
Context: ...vations can take on many forms, such as observing a diagnostic code being applied to the ...(VERB_NOUN_SENT_LEVEL_REP)
[style] ~36-~36: This phrase is redundant. Consider writing “point” or “time”.
Context: ... being applicable to the patient at any point in time during their care. #### A _time-derive...(MOMENT_IN_TIME)
Markdownlint
terminology.md
7-7: Expected: h2; Actual: h4
Heading levels should only increment by one level at a time(MD001, heading-increment)
Additional comments not posted (5)
terminology.md (5)
7-13
: LGTM!The definition of "vocabulary index" is clear and well-explained.
Tools
Markdownlint
7-7: Expected: h2; Actual: h4
Heading levels should only increment by one level at a time(MD001, heading-increment)
22-25
: LGTM!The definition of "event" or "patient event" is clear and well-explained.
27-31
: LGTM!The definition of "event index" is clear and well-explained.
38-43
: LGTM!The definition of "time-derived measurement" is clear and well-explained.
45-48
: LGTM!The definition of "dynamic measurement" is clear and well-explained.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## dev #31 +/- ##
==========================================
+ Coverage 84.85% 92.48% +7.63%
==========================================
Files 17 24 +7
Lines 1096 1664 +568
==========================================
+ Hits 930 1539 +609
+ Misses 166 125 -41 ☔ View full report in Codecov by Sentry. |
@Oufattole, @EthanSteinberg discovered that this won't work even as implemented here b/c |
…ted input file encodings so that unzipping works. Also made the bash arg (I think) work. Also removed rootutils to help address #114
Re-opened with changes to |
Resolves issue #30
Summary by CodeRabbit
.csv.gz
files, enhancing user flexibility in data handling based on their storage preferences.